At the end of this lesson, students should be able to:
- differentiate all types of variables.
- understand scale of measurements.
- describe and calculate the properties of data.
September 3, 2019
At the end of this lesson, students should be able to:
Collection, analysis, interpretation and presentation of data to discover its underlying causes, patterns, relationships and trends.
Two major branches of statistics:
Variables are measurements or observations that are typically numeric.
Three categories of variables:
Important to determine the kind of statistical procedures that can be used on that variable.
Four different scales of measurement:
State whether the variable is continuous or discrete, and quantitative or qualitative.
Name the variable being measured, (2) state whether it is continuous or discrete, and (3) state whether the variable is quantitative or qualitative.
Also known as measures of central location (locate central distribution).
Three kinds of averages of a data set to answer “where do the data center?”
Measures include:
| Patient number | Reduction in BP (mmHg) |
|---|---|
| 1 | 20 |
| 2 | 25 |
| 3 | 21 |
| 4 | 34 |
| 5 | 31 |
| 6 | 37 |
\[\begin{aligned} &= (20 + 25 + 21 + 34 + 41 + 37)/6 \\ &= 178/6 \\ &= 29.67 \end{aligned}\]
| Number of volunteers | Pain assessment by volunteers |
|---|---|
| 2 | 3 (extreme pain) |
| 12 | 2 (moderate pain) |
| 6 | 1 (slight pain) |
\[\begin{aligned} &= (20 \times 3) + (12 \times 2) + (6 \times 1) / 20 \\ &= 36/20 \\ &= 1.8 \end{aligned}\]
| Diameter (mm) | Frequency | Midpoint (x) | f.x |
|---|---|---|---|
| 35-39 | 6 | 37 | 222 |
| 40-44 | 12 | 42 | 504 |
| 45-49 | 15 | 47 | 705 |
| 50-54 | 10 | 52 | 520 |
| 55-59 | 7 | 57 | 399 |
| Total | 50 | 2350 |
- Mean \(= 2350 / 50 = 47\)
| 24.8 | 22.8 | 24.6 | 192.5 | 25.2 | 18.5 | 23.7 |
| 24.8 | 22.8 | 24.6 | 192.5 | 25.2 | 18.5 | 23.7 |
The order
|
18.5 |
22.8 |
23.7 |
24.6 |
24.8 |
25.2 |
192.5 |
Then select the middle number in the list, in this case 24.6.
In this sense, it locates the center of the data.
If there are an even number of measurements in the data sets, there will be two middle elements -> take the mean of middle two as the median
Example:
|
18.5 |
22.8 |
23.7 |
24.6 |
24.8 |
25.2 |
28.9 |
192.5 |
Median: (24.6 + 24.8) / 2 = 24.7
Data set 1:
|
-1 |
0 |
2 |
0 |
The mode of this data set is 0.
Data set 2:
|
2 |
2 |
3 |
1 |
1 |
5 |
Two most frequently observed values in this data set are 1 and 2. Therefore mode is a set of two values : {1,2}
Weight of luggage presented by airline passengers at check-in (measured to the nearest kg).
| 18 | 23 | 20 | 21 | 24 | 23 | 20 | 20 | 15 | 19 | 24 |
Mode: 20
Median: 20
| 15 | 18 | 19 | 20 | 20 | 20 | 21 | 23 | 23 | 24 | 24 |
- Mean = (15+18+19+20+20+20+21+23+23+24+24) / 11 = 20.64
| Staff | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| Salary | 15k | 18k | 16k | 14k | 15k | 15k | 12k | 17k | 90k | 95k |
- The mean salary is 30.7k
Also known as measures of variation
How spread out are the data?
Describing quantitative data will not be complete without knowing how observed values are spread out from the average.
E.g: two classes who sat the same exam might have the same mean mark but the marks may vary in a different pattern around this.
Measures include:
| Set A | Set B |
|---|---|
| 10 | 28 |
| 20 | 29 |
| 30 | 30 |
| 20 | 29 |
| 10 | 28 |
| Mean: 30 | Mean: 30 |
On test A, the range of marks is 70-45=25.
On test B, the range of marks is 65-45=20.
| \(X\) | \(X-mean\) | \(X-mean^2\) |
|---|---|---|
| 1 | -5 | 25 |
| 3 | -3 | 9 |
| 14 | 8 | 64 |
| 18 | 98 |
For interpreting a known value of the standard deviation